feat: add focused LLM slice files (llms-api.txt, llms-node-ops.txt) by crtahlin · Pull Request #802 · ethersphere/bee-docs

crtahlin · 2026-05-13T14:05:29Z

Summary

Adds two task-specific documentation bundles so AI coding agents can load only what's relevant instead of the full 630KB llms-full.txt:

llms-api.txt (220KB, 20 docs) — API usage, uploads, stamps, feeds, chunks, encryption, developer tooling
llms-node-ops.txt (225KB, 22 docs) — installation, configuration, monitoring, staking, backups, upgrades, FAQ

Uses docusaurus-plugin-llms customLLMFiles — glob patterns auto-include new pages, no manual maintenance.

Also adds discovery links in static/llms.txt and extends the validation script.

Refs: ethersphere/DevRel#840

Maintenance

Auto-maintained: slice patterns use directory globs, so new docs added under docs/develop/ or docs/bee/installation/ are automatically included
Validation: scripts/validate-llms-txt.mjs logs referenced slice files
No manual file curation needed — the plugin generates from patterns at build time

Test plan

npm run build succeeds — both files generated
llms-api.txt: 20 documents, correct header/rootContent
llms-node-ops.txt: 22 documents, correct header/rootContent
static/llms.txt references both slices
Validation script detects slice references

Add two task-specific documentation bundles via docusaurus-plugin-llms customLLMFiles, so AI agents can load only the docs relevant to their task instead of the full 630KB llms-full.txt: - llms-api.txt (220KB, 20 docs): API usage, uploads, stamps, feeds, chunks, encryption, developer tooling - llms-node-ops.txt (225KB, 22 docs): installation, configuration, monitoring, staking, backups, upgrades, FAQ Both use glob patterns so new pages added under those directories are automatically included — no manual maintenance needed. Also adds slice file references to static/llms.txt for agent discovery, and extends the validation script to log referenced slice files. Refs: ethersphere/DevRel#840

netlify · 2026-05-13T14:05:36Z

✅ Deploy Preview for test-twitter-preview-testing-3 ready!

Name	Link
🔨 Latest commit	`22ac0bd`
🔍 Latest deploy log	https://app.netlify.com/projects/test-twitter-preview-testing-3/deploys/6a0c8a6679911b0008fe9ed3
😎 Deploy Preview	https://deploy-preview-802--test-twitter-preview-testing-3.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

darkobas2

Nice idea — the slices are exactly what agents need to skip the 630KB monolith. A few things before merge:

1. `CLAUDE.md` looks like a personal config leaking into the upstream repo

- **Always use the `crtahlin` fork/repo** for creating issues, branches, and all GitHub operations — never the upstream `ethersphere` repo.

This is committed to ethersphere/bee-docs (the upstream). If another contributor clones the repo and uses Claude Code, they'll get told to push to your fork. That's clearly not what you want for everyone else.

Two options:

Keep CLAUDE.md here but strip personal/workflow instructions — leave only the things that apply to every contributor (project overview, commands, architecture, conventions).
Move the personal bits to ~/.claude/CLAUDE.md (user-scoped, not committed) or to a .claude/ file you keep in your fork only.

Same goes for **Never mention Claude** — that's a defensible project rule, but you might want to phrase it as "no AI-attribution noise in commits/issues" since CLAUDE.md itself is literally mentioning Claude in the repo.

2. Two `includePatterns` don't match any file — silently dropped

I checked the actual docs/develop/ tree against the patterns:

docs/develop/act.md — doesn't exist. The file you almost certainly want is docs/develop/access-control.md (ACT == Access Control Trie).
docs/develop/gateway-proxy.md — doesn't exist as a top-level develop file. There's docs/develop/tools-and-features/gateway-proxy.md (which is already listed) and docs/develop/gateway.md (not listed — was that the intended addition?).

The plugin silently drops patterns that don't match, which is why your build still passes and the PR description's "20 docs" count comes out right — but you're losing access-control content.

3. Validation script doesn't catch the above

const sliceRe = /https:\/\/docs\.ethswarm\.org\/(llms-[a-z-]+\.txt)/g;

This only confirms that names referenced in llms.txt look like slice file names — it doesn't verify any includePatterns actually resolve to files. A typo in customLLMFiles (like act.md) is invisible until someone diff's the generated slice contents.

Worth adding a check that resolves the globs and warns on patterns matching zero files. Same idea as --fail-on-glob-mismatch in other tooling. Otherwise this regresses silently next time someone refactors filenames.

Minor

description and rootContent mostly duplicate each other; if both are emitted into the slice header that's fine, but worth confirming with a quick head on the generated files.
fullContent: true on both — is there a use case for the trimmed-content variant for an even smaller slice? Not blocking, just curious about the size/value tradeoff.

Architecture (customLLMFiles + auto-include via globs + discovery links in llms.txt) is solid. Fix items 1 and 2, ideally 3, and this is good to go.

darkobas2

Nice cleanup — the slice idea is exactly what coding agents need. Three things to address before merge:

🔴 Two `llms-api.txt` paths don't exist in the repo

Cross-checked the includePatterns array against master:

Pattern	Status
`docs/develop/act.md`	❌ does not exist (likely meant `docs/develop/access-control.md`)
`docs/develop/gateway-proxy.md`	❌ does not exist (the file is at `docs/develop/tools-and-features/gateway-proxy.md`, which is already listed — looks like an accidental duplicate at the wrong path)

The array has 22 entries but the PR description / build log shows "20 documents" — i.e. these two paths matched nothing and were silently dropped. The slice is missing ACT (access control) entirely, which is exactly the kind of thing a developer agent will be asked about.

Fix: replace docs/develop/act.md with docs/develop/access-control.md, and drop the duplicate docs/develop/gateway-proxy.md line.

🟡 `CLAUDE.md` is personal config, not upstream config

Two rules in the added file are author-specific and don't belong in ethersphere/bee-docs:

Always use the crtahlin fork/repo for creating issues, branches, and all GitHub operations — never the upstream ethersphere repo.

Never mention Claude in any commit messages, issue titles, issue bodies, branch names, or any other visible output. Do not reference AI assistance.

Both of these are workflow preferences for you working from your fork. If another contributor (or another agent) clones upstream and reads this CLAUDE.md, they'll be told to push to your fork — which is wrong. Suggest either:

Drop the CLAUDE.md from this PR entirely and keep it as a local-only file (gitignored), or
Keep an upstream CLAUDE.md but limit it to repo-neutral content (project overview, build commands, conventions) — strip the personal rules.

The "Project Overview / Architecture / Commands / Conventions" sections are useful and worth keeping if you split them.

🟡 `validate-llms-txt.mjs` change doesn't actually validate

The new block is labelled "Verify slice file references … point to files that will be generated" but the code just logs the references — it doesn't check the include patterns resolve to real markdown files. Had it actually globbed includePatterns against the filesystem, it would have caught the two missing paths above. Worth tightening, since the whole "auto-maintained via globs" story relies on these patterns not silently no-op-ing.

✅ Looks good

Glob-based docs/bee/installation/** for llms-node-ops.txt — all 11 install pages resolve.
All 12 working-with-bee/* paths in llms-node-ops.txt exist.
static/llms.txt discovery links are clean.
Splitting a 630KB blob into 220KB / 225KB task-specific bundles is the right move for context budgets.

- Remove CLAUDE.md from repo (personal config) and add to .gitignore - Replace docs/develop/act.md with docs/develop/access-control.md - Remove duplicate docs/develop/gateway-proxy.md - Add includePatterns file-existence check to validate-llms-txt.mjs: globs checked via globSync, exact paths via readFileSync — would have caught both broken patterns before this review

crtahlin · 2026-05-19T16:07:04Z

Thanks for the thorough review — all three items addressed in 22ac0bd:

Broken paths — replaced docs/develop/act.md with docs/develop/access-control.md, removed duplicate docs/develop/gateway-proxy.md. Include patterns: 22 → 20 entries, all resolving.
CLAUDE.md — removed from the PR entirely and added to .gitignore. Keeping it as a local-only file in my fork.
Validation script — added step 4 to validate-llms-txt.mjs that checks every includePatterns entry against the filesystem (glob patterns via globSync, exact paths via readFileSync). Would have caught both broken paths before the review.

darkobas2

Re-reviewed. All three prior blockers are addressed:

CLAUDE.md removed from commit; now in .gitignore ✓
act.md → access-control.md and the gateway-proxy path fixed; verified all 22 API + 11 node-ops paths + docs/bee/installation/** resolve on the head commit ✓
Validation script now walks includePatterns and warns on missing files ✓

One small nit (won't block): the validator isn't wired into a package.json script or CI workflow, so the new pattern check won't actually run unless invoked manually. Worth a follow-up npm run validate-llms + a CI step.

LGTM, ready to merge.

crtahlin added 3 commits April 23, 2026 10:25

Sync: 2026-04-23

955b47b

Merge remote-tracking branch 'upstream/master'

6bb46cc

crtahlin requested a review from darkobas2 May 18, 2026 08:50

darkobas2 reviewed May 18, 2026

View reviewed changes

darkobas2 requested changes May 18, 2026

View reviewed changes

crtahlin requested a review from darkobas2 May 19, 2026 16:07

darkobas2 approved these changes May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add focused LLM slice files (llms-api.txt, llms-node-ops.txt)#802

feat: add focused LLM slice files (llms-api.txt, llms-node-ops.txt)#802
crtahlin wants to merge 4 commits into
ethersphere:masterfrom
crtahlin:feat/llm-slice-files

crtahlin commented May 13, 2026

Uh oh!

netlify Bot commented May 13, 2026 •

edited

Loading

Uh oh!

darkobas2 left a comment

Uh oh!

darkobas2 left a comment

Uh oh!

crtahlin commented May 19, 2026

Uh oh!

darkobas2 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

crtahlin commented May 13, 2026

Summary

Maintenance

Test plan

Uh oh!

netlify Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for test-twitter-preview-testing-3 ready!

Uh oh!

darkobas2 left a comment

Choose a reason for hiding this comment

1. CLAUDE.md looks like a personal config leaking into the upstream repo

2. Two includePatterns don't match any file — silently dropped

3. Validation script doesn't catch the above

Minor

Uh oh!

darkobas2 left a comment

Choose a reason for hiding this comment

🔴 Two llms-api.txt paths don't exist in the repo

🟡 CLAUDE.md is personal config, not upstream config

🟡 validate-llms-txt.mjs change doesn't actually validate

✅ Looks good

Uh oh!

crtahlin commented May 19, 2026

Uh oh!

darkobas2 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify Bot commented May 13, 2026 •

edited

Loading

1. `CLAUDE.md` looks like a personal config leaking into the upstream repo

2. Two `includePatterns` don't match any file — silently dropped

🔴 Two `llms-api.txt` paths don't exist in the repo

🟡 `CLAUDE.md` is personal config, not upstream config

🟡 `validate-llms-txt.mjs` change doesn't actually validate